First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations

نویسندگان

  • Guillermo Garcia-Hernando
  • Shanxin Yuan
  • Seungryul Baek
  • Tae-Kyun Kim
چکیده

In this work we study the use of 3D hand poses to recognize first-person hand actions interacting with 3D objects. Towards this goal, we collected RGB-D video sequences of more than 100K frames of 45 daily hand action categories, involving 25 different objects in several hand grasp configurations1. To obtain high quality hand pose annotations from real sequences, we used our own mo-cap system that automatically infers the location of each of the 21 joints of the hand via 6 magnetic sensors on the finger tips and the inverse-kinematics of a hand model. To the best of our knowledge, this is the first benchmark for RGB-D hand action sequences with 3D hand poses. Additionally, we recorded the 6D (i.e. 3D rotations and locations) object poses and provide 3D object models for a subset of hand-object interaction sequences. We present extensive experimental evaluations of RGB-D and pose-based action recognition by 18 baselines/state-of-the-art. The impact of using appearance features, poses and their combinations are measured, and the different training/testing protocols including cross-persons are evaluated. Finally, we assess how ready the current hand pose estimation is when hands are severely occluded by objects in egocentric views and its influence on action recognition. From the results, we see clear benefits of using hand pose as a cue for action recognition compared to other data modalities. Our dataset and experiments can be of interest to communities of 6D object pose, robotics, and 3D hand pose estimation as well as action recognition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

3D Hand Pose Detection in Egocentric RGB-D Images

We focus on the task of everyday hand pose estimation from egocentric viewpoints. For this task, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolv...

متن کامل

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

We propose Human Pose Models that represent RGB and depth images of human poses independent of clothing textures, backgrounds, lighting conditions, body shapes and camera viewpoints. Learning such universal models requires training images where all factors are varied for every human pose. Capturing such data is prohibitively expensive. Therefore, we develop a framework for synthesizing the trai...

متن کامل

Online 3D Reconstruction and 6-DoF Pose Estimation for RGB-D Sensors

In this paper, we propose an approach to Simultaneous Localization and Mapping (SLAM) for RGB-D sensors. Our system computes 6-DoF pose and sparse feature map of the environment. We propose a novel keyframe selection scheme based on the Fisher information, and new loop closing method that utilizes feature-to-landmark correspondences inspired by image-based localization. As a result, the system ...

متن کامل

Trajectory aligned features for first person action recognition

Egocentric videos are characterised by their ability to have the first person view. With the popularity of Google Glass and GoPro, use of egocentric videos is on the rise. Recognizing action of the wearer from egocentric videos is an important problem. Unstructured movement of the camera due to natural head motion of the wearer causes sharp changes in the visual field of the egocentric camera c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1704.02463  شماره 

صفحات  -

تاریخ انتشار 2017